Morpheme Segmentation in the METU-Sabancı Turkish Treebank

نویسنده

  • Ruken Cakici
چکیده

Morphological segmentation data for the METU-Sabancı Turkish Treebank is provided in this paper. The generalized lexical forms of the morphemes which the treebank previously lacked are added to the treebank. This data maybe used to train POS-taggers that use stemmer outputs to map these lexical forms to morphological tags.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Lexical Statistics for Compound Word Recognition and Segmentation in Turkish

Compound words are cross-linguistic morphological phenomena that occur in all languages. Compound words are widely accepted to be stored in the lexicon but their constituents need to be accessed during both language learning and production processes. In this study, the use of corpora was investigated for how to differentiate single-stem words from single-word compounds and then how to segment c...

متن کامل

Revising the METU-Sabancı Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages

In this paper, we present a revision of the training set of the METU-Sabancı Turkish syntactic dependency treebank composed of 4997 sentences in accordance with the principles of the Meaning-Text Theory (MTT). MTT reflects the multilayered nature of language by a linguistic model in which each linguistic phenomenon is treated at its corresponding level(s). Our analysis of the METU-Sabancı synta...

متن کامل

ITU Validation Set for Metu-Sabancı Turkish Treebank

The Turkish Treebank (Oflazer et al., 2003; Atalay et al., 2003) created by the Middle East Technical University and Sabancı University is available to the researchers since 2003 and it is used by many researchers since then (Eryiğit and Oflazer, 2006; Eryiğit et al., 2006b; Eryiğit et al., 2006a; Nivre et al., 2007; Çakıcı and Baldridge, 2006; Buchholz and Marsi, 2006; Yüret, 2006; Wu et al., ...

متن کامل

Transition-based Dependency DAG Parsing Using Dynamic Oracles

In most of the dependency parsing studies, dependency relations within a sentence are often presented as a tree structure. Whilst the tree structure is sufficient to represent the surface relations, deep dependencies which may result to multi-headed relations require more general dependency structures, namely Directed Acyclic Graphs (DAGs). This study proposes a new dependency DAG parsing appro...

متن کامل

A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus

This paper describes first steps towards extending the METU Turkish Corpus from a sentence-level language resource to a discourse-level resource by annotating its discourse connectives and their arguments. The project is based on the same principles as the Penn Discourse TreeBank (http://www.seas.upenn.edu/~pdtb) and is supported by TUBITAK, The Scientific and Technological Research Council of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012